Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment
نویسندگان
چکیده
The paper describes the basic strategies behind the word and semantic level alignment in the Bulgarian-English treebank. The word level alignment has taken into consideration the experience within other NLP groups in the context of the Bulgarian language specific features. The semantic level alignment builds on the word level alignment and is represented in the framework of the Minimal Recursion Se-
منابع مشابه
Linguistic Issues in Language Technology – LiLT
The paper describes the construction of a Bulgarian-English treebank aligned on the word and semantic level. We consider the manual word level alignment easier and more reliable than the manual alignment on syntactic and semantic level. Thus, after manual word level alignment we apply an automatic procedure for the construction of semantic level alignments. Our work presents the main steps of t...
متن کاملCreating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC
This contribution describes an Arabic-English parallel word aligned treebank corpus from the Linguistic Data Consortium that is currently under production. Herein we primarily focus on efforts required to assemble the package and instructions for using it. It was crucial that word alignment be performed on tokens produced during treebanking to ensure cohesion and greater utility of the corpus. ...
متن کاملLanguage engineering for syntactic knowledge transfer
In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically r...
متن کاملOntology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation
The paper reports on recent experiments in cross-lingual document processing (with a case study for Bulgarian-English-Romanian language pairs) and brings evidence on the benefits of using linguistic ontologies for achieving, with a high level of accuracy, difficult tasks in NLP such as word alignment, word sense disambiguation, document classification, cross-language information retrieval, etc....
متن کاملA Model for Fine-Grained Alignment of Multilingual Texts
While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bior multilingual texts which are aligned on a level inbetween are a useful resource for many purposes. Starting from a number of examples of non-literal translations, which tend to make alignment difficult, we describe an alignment model which copes with these cases...
متن کامل